Apparatus and method for reproducing an audio signal, apparatus and method for generating a coded audio signal, computer program and coded audio signal

ABSTRACT

An apparatus for reproducing an audio signal includes a first reproducer configured to reproduce a first portion of the audio signal in a first frequency band based on the first data. A provider is configured to provide a patch signal in a second frequency band, wherein the patch signal is at least partially uncorrelated with respect to the first portion of the audio signal or is at least partially a decorrelated version of the first portion of the audio signal, which has been shifted to the second frequency band. A second reproducer is configured to reproduce a second portion of the audio signal in the second frequency band based on second data and the patch signal. A combiner is configured to combine the reproduced first portion of the audio signal and the patch signal.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of copending InternationalApplication No. PCT/EP2013/067730, filed Aug. 27, 2013, which isincorporated herein by reference in its entirety, and additionallyclaims priority from U.S. patent application Ser. No. 61/693,575, filedAug. 27, 2012, as well as European Patent Application No. 12187265,filed Oct. 4, 2012, all of which are incorporated herein by reference intheir entirety.

The present invention relates to an apparatus, a method and a computerprogram for reproducing an audio signal and, in particular, to anapparatus, a method and a computer program for reproducing an audiosignal in situations in which the available data rate is reduced. Inaddition, the present invention relates to an apparatus, a method and acomputer program for generating a coded audio signal and a correspondingcoded audio signal.

BACKGROUND OF THE INVENTION

The perceptually adapted encoding of audio signals, for efficientstorage and transmission of these data rate reduced signals, has gainedacceptance in many fields. Encoding algorithms are known, in particularas MPEG-1/2, layer 3 “MP3”, MPEG-2/4 Advanced Audio Coding (AAC) orMPEG-H Unified Speech and Audio Coding (USAC). The underlying codingtechniques, in particular when achieving lowest bit rates, lead to areduction of the audio quality. The impairment is often mainly caused byan encoder side limitation of the audio signal bandwidth to betransmitted.

In such a situation, it is known state-of-the-art to subject the audiosignal to a band limiting on the encoder side, and to encode only alower band of the audio signal by means of a high quality audio encoder.The upper band, however, is only very coarsely characterized by a set ofparameters, which convey e.g. the spectral envelope of the upper band.On the decoder side, the upper band is then synthesized by patching thedecoded lower band signal into the otherwise empty upper band andperforming subsequent parameter controlled adjustments.

Standard methods for a bandwidth extension of band-limited audio signalsuse a copying function of low-frequency signal portions (LF) into thehigh frequency range (HF), in order to approximate information missingdue to the band limitation. In principle, such a copying function istechnically equivalent to a spectral shift computed in time domain bymeans of single sideband (SSB) modulation, but computationally much lesscomplex. Such methods, like Spectral Band Replication (SBR), aredescribed in M. Dietz, L. Liljeryd, K. Kjörling and O. Kunz, “SpectralBand Replication, a novel approach in audio coding,” in 112th AESConvention, Munich, May 2002; S. Meltzer, R. Böhm and F. Henn, “SBRenhanced audio codecs for digital broadcasting such as “Digital RadioMondiale” (DRM),” 112th AES Convention, Munich, May 2002; T. Ziegler, A.Ehret, P. Ekstrand and M. Lutzky, “Enhancing mp3 with SBR: Features andCapabilities of the new mp3PRO Algorithm,” in 112th AES Convention,Munich, May 2002; International Standard ISO/IEC 14496-3:2001/FPDAM 1,“Bandwidth Extension,” ISO/IEC, 2002, or “Speech bandwidth extensionmethod and apparatus”, Vasu Iyengar et al. U.S. Pat. No. 5,455,888.

In these methods no harmonic transposition is performed, but successivebandpass signals of the lower band are introduced into successivefilterbank channels of the upper band. By this, a coarse approximationof the upper band of the audio signal is achieved. This coarseapproximation of the signal is then in a further step approximated tothe original by a post processing using control information gained fromthe original signal. Here, e.g. scale factors serve for adapting thespectral envelope, an inverse filtering and the addition of a noisefloor for adapting tonality and a supplementation by sinusoidal signalportions, as it is also described in the MPEG-4 Standard.

It is known from harmonic bandwidth extensions techniques described inNagel, F.; Disch, S. A Harmonic Bandwidth Extension Method for AudioCodecs, IEEE Int. Conf. on Acoustics, Speech and Signal Processing(ICASSP), 2009; Nagel, F.; Disch, S.; Rettelbach, N. A Phase VocoderDriven Bandwidth Extension Method with Novel Transient Handling forAudio Codecs, 126th AES Convention, 2009; Zhong, H.; Villemoes, L.;Ekstrand, P. et al. QMF Based Harmonic Spectral Band Replication, 131stAudio Engineering Society Convention, 2011; Villemoes, L.; Ekstrand, P.;Hedelin, P. Methods for enhanced harmonic transposition, IEEE Workshopon Applications of Signal Processing to Audio and Acoustics, (WASPAA),2011, that in synthesizing the upper band unwanted auditory roughnessmight be introduced into the signal. One cause (out of many) of saidroughness is spectral misalignment of the patch and/or dissonanceeffects in the transition regions between lower band and first patch orbetween consecutive patches. Harmonic bandwidth extensions techniquesare designed to improve on these two aspects, albeit at the expense ofcomputational complexity.

Filterbank calculations and patching in the filterbank domain,especially in harmonic bandwidth extension, may indeed become a highcomputational effort. In WO 98/57436 an advanced patching technique isdescribed which can, to some limited extent, avoid dissonance effects byintroducing so-called guard bands between different spectral patches andby performing a modified copy-up patching to lessen spectralmisalignment while keeping computational complexity moderate.

Apart from this, further methods exist such as the so-called “blindbandwidth extension”, described in E. Larsen, R. M. Aarts, and M.Danessis, “Efficient high-frequency bandwidth extension of music andspeech”, In AES 112th Convention, Munich, Germany, May 2002 wherein noinformation on the original HF range is used. Further, also the methodof the so-called “Artificial bandwidth extension”, exists which isdescribed in K. Käyhkö, A Robust Wideband Enhancement for NarrowbandSpeech Signal; Research Report, Helsinki University of Technology,Laboratory of Acoustics and Audio signal Processing, 2001.

In J. Mäkinen et al.: AMR-WB+: a new audio coding standard for 3rdgeneration mobile audio services Broadcasts, IEEE, ICASSP '05, a methodfor bandwidth extension is described, wherein the copying operation ofthe bandwidth extension with an up-copying of successive bandpasssignals according to SBR technology is replaced by mirroring, forexample, by upsampling.

Further technologies for bandwidth extension are described in thefollowing documents. R. M. Aarts, E. Larsen, and O. Ouweltjes, “Aunified approach to low-and high frequency bandwidth extension”, AES115th Convention, New York, USA, October 2003; E. Larsen and R. M.Aarts, “Audio Bandwidth Extension — Application to psychoacoustics,Signal Processing and Loudspeaker Design”, John Wiley & Sons, Ltd.,2004; E. Larsen, R. M. Aarts, and M. Danessis, “Efficient high-frequencybandwidth extension of music and speech”, AES 112th Convention, Munich,May 2002; J. Makhoul, “Spectral Analysis of Speech by LinearPrediction”, IEEE Transactions on Audio and Electroacoustics, AU-21(3),June 1973; U.S. patent application Ser. No. 08/951,029; U.S. Pat. No.6,895,375.

Known methods of harmonic bandwidth extension show a high complexity. Onthe other hand, methods of complexity-reduced bandwidth extension showquality losses. In particular with a low bitrate and in combination witha low bandwidth of the LF range, artifacts such as roughness and atimbre perceived to be unpleasant may occur. A reason for this isprimarily the fact that the approximated HF portion is based on one ormore direct copy or mirror operations of the LF portion of the spectrum.

SUMMARY

According to an embodiment, an apparatus for reproducing an audio signalbased on first data representing a coded version of a first portion ofthe audio signal in a first frequency band and second data representingside information on a second portion of the audio signal in a secondfrequency band, the second frequency band including frequencies higherthan the first frequency band, may have: a first reproducer configuredto reproduce the first portion of the audio signal based on the firstdata; a provider configured to provide a patch signal in the secondfrequency band, wherein the patch signal is at least partiallyuncorrelated with respect to the first portion of the audio signal or isat least partially a decorrelated version of the first portion of theaudio signal, which has been shifted to the second frequency band; asecond reproducer representing a post-processor and configured toreproduce the second portion of the audio signal in the second frequencyband based on the second data and the patch signal, wherein a spectralenvelope of the second portion of the audio signal, a noise floor in thesecond portion of the audio signal, a tonality measure for each partialband in the second portion of the audio signal, and an explicit codingof prominent sinusoidal portions in the second portion of the audiosignal represent side information represented by the second data; and acombiner to combine the reproduced first portion of the audio signal andthe patch signal before the second portion of the audio signal isreproduced by the second reproducer or to combine the reproduced firstportion of the audio signal and the reproduced second portion of theaudio signal.

According to another embodiment, a method for reproducing an audiosignal based on first data representing a coded version of a firstportion of the audio signal in a first frequency band and second datarepresenting side information on a second portion of the audio signal ina second frequency band, the second frequency band including frequencieshigher than the first frequency band, may have the steps of: reproducingthe audio signal in the first frequency band based on the first data;providing a patch signal in the second frequency band, wherein the patchsignal is at least partially uncorrelated with respect to the firstportion of the audio signal or is at least partially a decorrelatedversion of the first portion of the audio signal, which has been shiftedto the second frequency band; reproducing the second portion of theaudio signal in the second frequency band based on the second data andthe patch signal by means of a post-processor, wherein a spectralenvelope of the second portion of the audio signal, a noise floor in thesecond portion of the audio signal, a tonality measure for each partialband in the second portion of the audio signal, and an explicit codingof prominent sinusoidal portions in the second portion of the audiosignal represent side information represented by the second data; andcombining the reproduced first portion of the audio signal and the patchsignal before the second portion of the audio signal is reproduced orcombining the reproduced first portion of the audio signal and thereproduced second portion of the audio signal.

According to another embodiment, an apparatus for generating a codedaudio signal, the coded audio signal including first data representing acoded version of a first portion of the audio signal in a firstfrequency band and second data representing side information on a secondportion of the audio signal in a second frequency band, the secondfrequency band including frequencies higher than the first frequencyband, may have: a decorrelation information adder configured to add tothe coded audio signal in addition to the first data and the second datainformation on a degree of decorrelation to be used between the firstportion of the audio signal and a patch signal based on which the secondportion of the audio signal is reproduced by means of a post-processorwhen reproducing the audio signal from the coded audio signal, wherein aspectral envelope of the second portion of the audio signal, a noisefloor in the second portion of the audio signal, a tonality measure foreach partial band in the second portion of the audio signal, and anexplicit coding of prominent sinusoidal portions in the second portionof the audio signal represent side information represented by the seconddata.

According to another embodiment, a method for generating a coded audiosignal, the coded audio signal including first data representing a codedversion of a first portion of the audio signal in a first frequency bandand second data representing side information on a second portion of theaudio signal in a second frequency band, the second frequency bandincluding frequencies higher than the first frequency band, may have thesteps of: adding to the coded audio signal in addition to the first dataand the second data information on a degree of decorrelation to be usedbetween the first portion of the audio signal and a patch signal basedon which the second portion of the audio signal is reproduced by meansof a post-processor when reproducing the audio signal from the codedaudio signal, wherein a spectral envelope of the second portion of theaudio signal, a noise floor in the second portion of the audio signal, atonality measure for each partial band in the second portion of theaudio signal, and an explicit coding of prominent sinusoidal portions inthe second portion of the audio signal represent side informationrepresented by the second data.

According to another embodiment, a computer program may have a programcode for performing a method according to claim 11 when the computerprogram runs on a computer.

According to another embodiment, a computer program may have a programcode for performing a method according to claim 13 when the computerprogram runs on a computer.

Embodiments of the invention relate to a reproduction of an audio signalproviding for a bandwidth extension using decorrelated sub-band audiosignals. In contrast to already existing methods, most of the signaldistortions and artifacts, which currently are typical for bandwidthextensions, may be avoided by using decorrelated sub-band audio signalsfor bandwidth extension, rather than correlated (copied-up or mirrored)sub-band audio signals. This is achieved by providing the audio signal,which forms the basis for a reproduction of a high-frequency portion ofthe audio signal, uncorrelated or decorrelated with respect to the firstportion (LF portion) of the audio signal. Embodiments of the inventionare based on the recognition that the correlation between the lowfrequency portion and the high frequency portion need not be maintainedwhen reproducing the second signal portion of the audio signal. Rather,the inventors recognized that artifacts, such as roughness and a timbreperceived to be unpleasant may be avoided by making use of adecorrelated or completely uncorrelated patch signal.

Embodiments of the invention provide for a coded audio signalcomprising:

first data representing a coded version of a first portion of the audiosignal in a first frequency band; second data representing sideinformation on a second portion of the audio signal in a secondfrequency band, the second frequency band comprising frequencies higherthan the first frequency band; and

information on a degree of decorrelation to be used between the firstportion of the audio signal and a patch signal based on which the secondportion of the audio signal is reproduced when reproducing the audiosignal from the coded audio signal.

Thus, embodiments of the invention permit for generating a coded audiosignal in a manner which permits for decoding the coded audio signal inan appropriate manner using an appropriate degree of decorrelation. Theappropriate degree of decorrelation may be determined at the encoderside based on properties of the first portion and/or the second portionof the audio signal.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will be detailed subsequentlyreferring to the appended drawings, in which:

FIG. 1 a shows a block diagram of an embodiment of an apparatus forreproducing an audio signal;

FIG. 1 b shows a block diagram of another embodiment of an apparatus forreproducing an audio signal;

FIG. 2 shows a block diagram of a further embodiment of an apparatus forreproducing an audio signal;

FIG. 3 shows a block diagram of an embodiment of an apparatus forgenerating a coded audio signal;

FIG. 4 a shows a schematical illustration of an encoder side in thecontext of embodiments of the invention;

FIG. 4 b shows a schematical illustration of a decoder-side in thecontext of embodiments of the invention;

FIGS. 5 a and 5 b show diagrams illustrating advantages of embodimentsof the invention;

FIG. 6 shows a block diagram of an apparatus for reproducing an audiosignal from which the invention starts; and

FIGS. 7 a to 7 d show signal diagrams useful in explaining the operationof the apparatus shown in FIG. 6.

DETAILED DESCRIPTION OF THE INVENTION

Prior to explaining embodiments of the invention in detail, it isregarded worthwhile shortly discussing theoretical thoughts underlyingthe invention.

As explained above, bandwidth extensions based on copy operations (ormirror operations), such as for example SBR (SBR=spectral bandreplication), copy large parts of an LF spectrum directly into the HFrange.

An example of an SBR apparatus is described referring to FIGS. 6 and 7.The envelope of an audio signal 2 is shown in FIG. 7 a. Audio signal 2comprises a low-frequency portion (or low-frequency band) 4 and ahigh-frequency portion (or high-frequency band) 6. Typically, inperceptual coding of audio signals, the low-frequency portion 4 is codedby means of a high quality audio encoder, such as a PCM encoder(PCM=pulse code modulation), while the upper band is only very coarselycharacterized by side information. Data representing the codedlow-frequency portion and data representing the side information aretransmitted using a corresponding core codec. FIG. 6 shows a basebandsignal 8 from a core codec, which represents the low-frequency portion 4shown in FIG. 7 b. This signal 8 is applied to a single sidebandmodulation/copy-up unit, in which signal 8 is shifted to the frequencyrange of the high-frequency portion 6. This shifted signal is shown assignal 10 in FIG. 7 c. Shifted signal 10 and signal 8 are applied to apatching unit 12, in which both signals are combined (added) to obtainthe spectrum shown in FIG. 7 c. The signal portion 8 may be shifted intop different higher frequency ranges, wherein p≧1. Thus, a combination ofone or more (p) shifted signals and signal 8 may take place in patchingunit 12.

The output signal of patching unit 12 is applied to a post-processingunit 14, which also receives side information 16 representing the audiosignal in the high-frequency portion 6. Thus, the high frequency portion10′ of the audio signal 6 is reproduced based on the side information 16and the audio signal of the low-frequency portion 4. The resulting audiosignal is shown in FIG. 7 d. Post-processing unit 14 outputs the fullband output covering the frequency ranges of the low-frequency portion 4and the high-frequency portion 6.

Accordingly, bandwidth extensions based on copy operations (or mirroroperations), such as for example SBR, copy large parts of alow-frequency spectrum directly into the high-frequency range. This maybe achieved by employing a single-sideband modulation of the time-domainrepresentation of the audio signal or by a direct copy process (copy-up)in the spectral representation of the audio signal. This processing stepis usually called “patching”.

Generally, there may be a plurality of patches copied into differenthigh frequency bands. The respective frequency bands may overlap or not.Each of the corresponding HF patches thus is completely correlated tothe low-frequency range from which it has been extracted. The inventorsrecognized that, thereby, temporal envelope modulations may occur bysuperimposing both signals with a frequency that depends on the spectraldistance between the LF band and the spectral location of the respectiveHF patch.

From a system-theoretical point of view, this phenomenon is to beregarded as dual to the operation of a finite impulse response (FIR)comb filter comprising a delay of n samples with Fs as sample frequency.This filter has a magnitude frequency response with a comb width(spectral distance between two maxima of the magnitude frequencyresponse) of 1/n*Fs. Thereby, the system-theoretical duality has thefollowing direct correspondences:

-   -   time delay <-> frequency translation    -   magnitude frequency response <-> temporal envelope.

The inventors recognized that the temporal modulations resultingtherefrom are audible in a disturbing manner and can be made visible inthe autocorrelation function of the waveform magnitude in the form ofperiodically repeating side maxima. Such periodically repeating sidemaxima in the autocorrelation sequence of a noise signal envelope forcopy-up SBR are shown in FIG. 5 a. FIG. 5 a shows the autocorrelationfunction of the magnitude envelope of white noise, wherein the bandwidthis extended with three direct copy-up patches, which are fullycorrelated among each other and with the LF band.

Only when the LF and the HF signal show the same amplitude, a maximummodulation depth is achieved. In practice, the modulation effecttherefore is often slightly lower, because typically the HF range ismarkedly quieter (less loud) than the LF range. Noise-like signals orquasi-stationary signals with a pronounced overtone structure are to beregarded as particularly critical with respect to the modulationartifacts.

For the presence of several patches (p in FIG. 6) that are entirelycorrelated among each other, the above-mentioned duality is valid aswell, of course. A temporal modulation of the magnitude envelope appearsthat is dual to the magnitude frequency response of a corresponding FIRfilter.

Thus, according to embodiments of the invention, the patch or thepatches are decorrelated from each other and from the LF band. Inembodiments of the invention, one or more decorrelators are used thatdecorrelate the signal derived from the low-frequency signal components,respectively, before it is inserted into the higher frequency range(s)and, as the case may be, post-processed.

Embodiments of the invention avoid the explained problems that occur dueto a copy operation or a mirror operation by using mutually decorrelatedpatches. In embodiments of the invention, the respective HF patches aredecorrelated from the LF band in an individual manner usingdecorrelators, for example by means of all-pass filters or other knowndecorrelation methods, or to create the patches synthetically in anaturally decorrelated manner right away.

In embodiments of the invention, the degree of decorrelation can befixedly determined or adjusted at the decoder-side, or it may betransmitted as a parameter from the encoder to the decoder. Furthermore,the entire patch may be decorrelated, or only specific portions of thepatch. The portions of the patch to be decorrelated by also betransmitted as a parameter from the encoder to the decoder as part ofthe corresponding information added to the coded audio signal.

The inventive approach is beneficial when compared to conventionalapproaches for bandwidth extension since distortions and soundcolorations by disturbing or parasitic envelope modulations, as theyexist with current methods based on single-sideband modulation/copy-upof the LF band, are inherently avoided with the inventive approach. Thisis achieved by using HF patches that are decorrelated versions of the LFsignal portion or that are completely uncorrelated with respect to theLF signal portion.

A scenario in which embodiments of the invention may be implemented isnow described with reference to FIGS. 4 a and 4 b.

An encoder side is shown in FIG. 4 a and a decoder side is shown in FIG.4 b. An audio signal is fed into a lowpass/highpass combination at aninput 700. The lowpass/highpass combination on the one hand includes alowpass (LP), to generate a lowpass filtered version of the audiosignal, illustrated at 703 in FIG. 7 a. This lowpass filtered audiosignal is encoded with an audio encoder 704. The audio encoder is, forexample, an MP3 encoder (MPEG-1/2 layer 3) or an AAC encoder, describedin the MPEG-2/4 standard. Alternative audio encoders providing atransparent or advantageously perceptually transparent representation ofthe band-limited audio signal 703 may be used in the encoder 704 togenerate a completely encoded or perceptually encoded and perceptuallytransparently encoded audio signal 705, respectively. The upper band ofthe audio signal is output at an output 706 by the highpass portion ofthe filter 702, designated by “HP”. The highpass portion of the audiosignal, i.e. the upper band or HF band, also designated as the HFportion, is supplied to a parameter calculator 707 which is implementedto calculate the different parameters (representing side informationrepresenting the high frequency portion of the audio signal). Theseparameters are, for example, the spectral envelope of the upper band 706in a relatively coarse resolution, for example, by representation of ascale factor for each frequency group on a perceptually adapted scale(critical bands) e.g. for each Bark band on the Bark scale. A furtherparameter which may be calculated by the parameter calculator 707 is thenoise floor in the upper band, whose energy per band may be related tothe energy of the envelope in this band. Further parameters which may becalculated by the parameter calculator 707 include a tonality measurefor each partial band of the upper band which indicates how the spectralenergy is distributed in a band, i.e. whether the spectral energy in theband is distributed relatively uniformly, wherein then a non-tonalsignal exists in this band, or whether the energy in this band isrelatively strongly concentrated at a certain location in the band,wherein then rather a tonal signal exists for this band. Furtherparameters consist in explicitly encoding peaks relatively stronglyprotruding in the upper band with regard to their height and theirfrequency, as the bandwidth extension concept, in the reconstructionwithout such an explicit encoding of prominent sinusoidal portions inthe upper band, will only recover the same very rudimentarily, or not atall.

In any case, the parameter calculator 707 is implemented to generateonly parameters 708 for the upper band which may be subjected to similarentropy reduction steps as they may also be performed in the audioencoder 704 for quantized spectral values, such as for exampledifferential encoding, prediction or Huffman encoding, etc. Theparameter representation 708 and the audio signal 705 are then suppliedto a datastream formatter 709 which is implemented to provide an outputside datastream 710 which will typically be a bitstream according to acertain format as it is for example normalized in the MPEG4 Standard.

The decoder side, as it may be suitable for the present invention, isshown in FIG. 7 b. The datastream 710 enters a datastream interpreter711 which is implemented to separate the parameter portion 708 from theaudio signal portion 705. The parameter portion 708 is decoded by aparameter decoder 712 to obtain decoded parameters 713. In parallel tothis, the audio signal portion 705 is decoded by an audio decoder 714 toobtain the audio signal 777 which was illustrated at 8 in FIG. 6, forexample.

Depending on the implementation, audio signal 777 may be output via afirst output 715. At the output 715, an audio signal with a smallbandwidth and thus also a low quality may then be obtained. For aquality improvement, however, bandwidth extension 720 may be performedmaking use of the inventive approach as described in the followingreferring to FIGS. 1 a, 1 band 2 to obtain the audio signal 112 on theoutput side with an extended or high bandwidth, respectively, and a highquality.

One embodiment of an inventive apparatus for reproducing an audio signaland, thereby extending the bandwidth thereof, is shown in FIG. 1 a. Theapparatus comprises a first reproducer 100, a provider 102, a combiner104 and a second reproducer 106. Optionally, a transition detector 108may be provided. The first reproducer 100 receives at an input thereoffirst data 120 representing a coded version of a first portion of audiodata in a first frequency band. For example, the first data 120 maycorrespond to audio signal portion 705 shown in FIG. 4 b. The firstreproducer 100 reproduces the audio signal in the first frequency bandbased on the first data 120. For example, the first reproducer 100 maybe formed by the audio decoder 714 shown in FIG. 4 b. The firstreproducer 110 outputs the audio signal in the first frequency band,which may correspond to audio signal 777 shown in FIG. 4 b. Audio signal777 is applied to provider 102, which provides for a patch signal 122 inthe second frequency band. The patch signal 122 is at least partiallyuncorrelated with respect to the first portion of the audio signal 777or is at least partially a decorrelated version of the first portion ofthe audio signal, which has been shifted to the second frequency band.The audio signal 777 and the patch signal 122 are combined, such asadded, in combiner 104. The combined signal 124 is output and applied tothe second reproducer 106. The second reproducer 106 receives thecombined signal 124 and second data 126 representing side information ona second portion of the audio signal in a second frequency band. Forexample, the second data 126 may correspond to decoded parameters 713described above with respect to FIG. 4 b. The second reproducer 106reproduces the audio signal in the second frequency band based on thepatch signal (within the combined signal 124) and based on the seconddata 126.

In embodiments of the invention, the first frequency band may correspondto the frequency range associated with the first portion of the audiosignal shown in FIG. 7 a, and the second frequency band may correspondto the frequency range associated with the second portion of the audiosignal shown in FIG. 7 a.

According to the embodiment shown in FIG. 1 a, the second reproducer 106outputs a reproduced audio signal 128 with a high bandwidth.

In the alternative embodiment shown in FIG. 1 b, the output of provider102 is coupled to the second reproducer 106 and the output of secondreproducer 106 is coupled to combiner 104. Thus, according to theembodiment shown in FIG. 1 b, an audio signal 130 in the secondfrequency band is reproduced from the patch signal provided by provider102 prior to combining the patch signal with the first portion 777 ofthe audio signal. Again, the second reproducer reproduces the audiosignal 130 in the second frequency band based on the second data 126 andthe patch signal 122. According to the embodiment shown in FIG. 1 b, thecombiner 104 outputs the reproduced audio signal 128.

In embodiments of the invention, the provider comprises a shifting unitand a decorrelator, which are configured to generate the patch signal asa decorrelated version of the first portion of the audio signal shiftedto the second frequency band. In embodiments of the invention, theprovider is configured to provide a synthetic patch signal which isuncorrelated with respect to the first portion of the audio signal. Inembodiments of the invention, the provider is configured to provide aplurality of patch signals for a plurality of higher frequency bands. Insuch embodiments the second reproducer and the second combiner areadapted to reproduce a plurality of second signal portions and tocombine the plurality of signal portions into the reproduced audiosignal.

An embodiment of an apparatus for reproducing an audio signal usingbandwidth extension, which uses decorrelated sub-band audio signals, isshown in FIG. 2. The apparatus receives a baseband signal from the corecodec, which may be signal 777 shown in FIG. 4 b. Signal 777 is appliedto a shifting unit 200. Shifting unit 200 is configured to shift signal777 from the low-frequency range to a high-frequency range, such as froma frequency range associated with the low-frequency portion 4 in FIG. 7a to the frequency range associated with the high-frequency portion 6 inFIG. 7 a.

Shifting unit 200 may be configured to simply copy-up signal portion 777to the high-frequency range in the frequency domain. Alternatively,shifting unit 200 may be implemented as a single sideband modulationunit configured to perform a single sideband modulation in the timedomain in order to shift the first portion of the audio signal from thefirst frequency band to the second frequency band.

The shifted first portion of the audio signal is applied to adecorrelation unit 202 a. The shifted decorrelated first portion of theaudio signal is output by the decorrelation unit 202 a as a patch signal204. The patch signal 204 is applied to a patching unit 206, in whichthe patch signal 204 is combined with the first portion 777 of the audiosignal. For example, the patch signal and the first portion of the audiosignal are concatenated or added in patching unit 206. The combinedsignal is output from patching unit 206 and applied to a post-processingunit 210.

Post-processing unit 210 receives second data 212 and represents asecond reproducer configured to reproduce the second portion of theaudio signal in a second frequency band based on the second data 212 andthe patch signal 204 (which is included in the combined signal 208).Again, the second data 212 represent side information and may correspondto decoded parameters 713 explained above with respect to FIG. 4 b. Afullband output 214 of post-processing unit 210 represents thereproduced audio signal.

In the embodiment shown in FIG. 2, shifting unit 200 and decorrelationunit 202 a represent a provider configured to provide a patch signal204.

In embodiments of the invention, shifting unit 200 may be configured toshift the first portion 777 of the audio signal into a plurality of pdifferent frequency bands. A decorrelation unit 202 a-202 p may beprovided for each shifted version in order to provide for p patchsignals. In case more than one patch is used, (such as p patches), the ppatches should be uncorrelated among each other and the LF band. Then,the shifted versions associated with each frequency band are combinedwithin patching unit 206. Second data representing side information foreach of the higher frequency bands may be provided to thepost-processing unit 210 so that a plurality of higher frequencyportions of the audio signal are reproduced in post-processing unit 210.

In embodiments of the invention, the first and second frequency bands(and the optionally further frequency bands) may overlap or may notoverlap in the frequency direction.

Accordingly, in embodiments of the invention, the provider comprises ashifter unit configured to shift a first portion of an audio signal in afirst frequency band to a second frequency band or to a plurality ofdifferent second frequency bands, and a decorrelator for decorrelatingthe shifted version of the first portion of the audio signal from thefirst portion of the audio signal. In embodiments of the invention, thedecorrelator may have the same properties as known for example fromspatial audio coding decorrelation. In the embodiments of the invention,the decorrelator may provide a sufficient decorrelation in order toavoid the signal distortions and artifacts which are typical forconventional bandwidth extensions using spectral band replication. Thedecorrelator may provide for a preservation of the spectral envelope ofthe first portion of the audio signal and/or may provide for apreservation of the temporal envelope, i.e. the transients, of the firstportion of the audio signal. Designing an appropriate decorrelator thusmight typically involve a trade-off to be made between transientpreservation and decorrelation.

In embodiments of the invention, the decorrelator may be implemented asan IIR (IIR=infinite impulse response) filter in time domain or sub-bandtime domain, e.g. an all-pass filter, in which decorrelation is achievedvia group-delay variations. In embodiments of the invention, thedecorrelator may be configured to provide for phase randomization ofspectral coefficients in a complex (oversampled) transform/filterbankrepresentation (DFT, QMF representation) (DFT=discrete FourierTransform; QMF=quadrature mirror filter). In embodiments of theinvention, the decorrelator may be configured in order to provide for anapplication of a frequency-dependent time delay in a filterbankrepresentation.

Embodiments of the invention may comprise a signal adaptive decorrelatorthat varies the degree of decorrelation in order to preserve transients.A high decorrelation may be provided for quasi-stationary signals, and alow decorrelation may be provided for transient signals. Accordingly, inembodiments of the invention, the provider for providing the patchsignal may be switchable between different degrees of decorrelation.

In embodiments, the provider for providing the patch signal may beswitchable between different degrees of decorrelation depending onwhether the first signal portion comprises an indicator for a strongcorrelation between the first portion of the audio signal and the secondportion of audio signal. Embodiments for such an indicator are atransient in the first portion of the audio signal, voiced speechconsisting of pulse trains in the first portion of the audio signaland/or the sound of brass instruments in the first portion of the audiosignal. In the following, embodiments are described, in which theindicator is a transient in the first portion of the audio signal.

In embodiments of the invention, the apparatus may comprise a detectorconfigured to detect whether the first portion of the audio signalcomprises a transient. Such a detector 108 is schematically shown inFIGS. 1 a and lb. Depending on the output signal of detector 108,provider 102 may be configured to provide the patch signal with a highdecorrelation for quasi-stationary signals, i.e. when the first portionof the audio signal does not have a transient), and a low decorrelationif the first portion of the audio signal has transient signals.

In alternative embodiments of the invention, the apparatus may comprisea signal adaptive decorrelator that is activated for quasi-stationarysignals and deactivated for transient signal portions. In other words,the provider may be configured to output the shifted first signalportion without decorrelation thereof in case the first signal portioncomprises transient signal portions and to output the decorrelated patchsignal only in case the first signal portion does not comprisetransients or transient signal portions. In such embodiments, the secondreproducer is configured to reproduce the audio signal in the secondfrequency band based on the second data and the patch signal if thefirst portion of the audio signal does not comprise a transient and isconfigured to reproduce the audio signal in a second frequency bandbased on the second data and a version of the first portion of the audiosignal, which has been shifted to the second frequency band and whichhas not been decorrelated, if the first portion of the audio signalcomprises a transient.

A transient or transient portions may be regarded as consisting in thefact that the audio signal changes a lot in total, i.e. that e.g. theenergy of the audio signal changes by more than 50% from one temporalportion to the next temporal portion, i.e. increases or decreases. The50% threshold is only an example, however, and it may also be smaller orgreater values. Alternatively, for a transient detection, the change ofenergy distribution may also be considered, e.g. in the transition froma vocal to a sibilant.

In embodiments of the invention, the provider may be configured toprovide a synthetic patch signal which is uncorrelated with respect tothe first portion of the audio signal. In other words, patching with anuncorrelated synthetic patch signal (such as synthetic noise) mightalready be sufficient if parametric post-processing is fine granular(high bit-rate codec scenario) or if the signal's HF band is noisy-likeanyway.

In embodiments of the invention, a correlation of the LF band and the HFband within a bandwidth extension (like SBR) is nevertheless helpful forenhancing a too coarse time grid of parametric post-processing (e.g. dueto a low bit-rate codec scenario), an accurate reproduction oftransients, and a preservation of tones that have a rich overtonestructure (usually, tonality is not affected by decorrelation and thusthe preservation of tonality does not pose a problem in designing adecorrelator).

As far as decorrelators known e.g. from spatial audio codingdecorrelation are concerned, reference is made to WO 2007/118583 A1, forexample.

In embodiments of the invention, provider 102 may comprise an adaptivedecorrelator, which adjusts decorrelation of the HF patches based on aparameter transmitted from an encoder to the decoder. In suchembodiments, the apparatus is configured for reproducing an audio signalbased on the first data, the second data and third data comprisinginformation on a degree of decorrelation to be used between the firstportion of the audio signal and a patch signal based on which the secondportion is reproduced when reproducing the audio signal from the codedaudio signal. Such third data may be added to coded audio data on theencoder side, such as by a decorrelation information adder 300 shown inFIG. 3 of the present application. The apparatus shown in FIG. 3corresponds to the apparatus shown in FIG. 4 a except for thedecorrelation information adder.

The decorrelation information adder 300 receives the output of low-passfilter 702 and may detect properties from the output signal of low-passfilter 702. For example, decorrelation information adder may detecttransients in the output signal of the low-pass filter 702. Depending onthe properties of the output of low-pass filter 702, decorrelationinformation adder adds to the coded audio signal 710 information on adegree of decorrelation to be used between the first portion of theaudio signal and a patch signal based on which the second portion isreproduced when reproducing the audio signal from the coded audiosignal. For example, the decorrelation information may instruct theprovider at the decoder-side to perform a low decorrelation or not anydecorrelation at all in case there are transient portions in thelow-frequency portion of the audio signal.

In embodiments of the invention, the decorrelation information adder mayalso receive the high-frequency portion 706 of the audio signal and maybe configured to derive properties therefrom. For example, in case thedecorrelation information adder detects that the HF band is noise-like,it may advise the provider on the decoder-side to provide the patchsignal based on a synthetic noise signal.

In such embodiments, the coded audio signal 320 represented by datastream 710 comprises first data 321 representing a coded version of afirst portion of an audio signal, second data 322 representing sideinformation on a second portion of the audio signal in a secondfrequency band, and information 323 on a degree of decorrelation to beused between the first portion of the audio signal and a patch signalbased on which the second portion is reproduced when reproducing theaudio signal from the coded audio signal.

Accordingly, embodiments of the invention provide for an improvedapproach for reproducing an audio signal, i.e. for a decoder-sideextension of the audio signal bandwidth. In other embodiments, theinvention provides for an apparatus for generating a coded audio signal.In even other embodiments, the invention relates to such coded audiosignals.

The advantageous effect achieved by the inventive approach can be madevisible by a comparison of the autocorrelation sequence of the noisesignal envelope for copy-up SBR (shown in FIG. 5 a) with theautocorrelation sequence of the noise signal envelope of decorrelatedpatches as shown in FIG. 5 b of the present application. FIG. 5 b is theautocorrelation function of the magnitude envelope of white noise,wherein the bandwidth is extended with three patches uncorrelated amongeach other and to the LF band. FIG. 5 b clearly shows the disappearanceof the unwanted side maxima shown in FIG. 5 a.

The present application is applicable or suitable for all audioapplications in which the full bandwidth is not available. The inventiveapproach may find use in the distribution or broadcasting of audiocontent such as, for example with digital radio, internet streaming andaudio communication applications. Embodiments of the invention arerelated to a bandwidth extension using decorrelated sub-band audiosignals.

Although some aspects have been described in the context of anapparatus, it is clear that these aspects also represent a descriptionof the corresponding method, where a block or device corresponds to amethod step or a feature of a method step. Analogously, aspectsdescribed in the context of a method step also represent a descriptionof a corresponding block or item or feature of a correspondingapparatus.

Depending on certain implementation requirements, embodiments of theinvention can be implemented in hardware or in software. Theimplementation can be performed using a digital storage medium, forexample a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROMor a FLASH memory, having electronically readable control signals storedthereon, which cooperate (or are capable of cooperating) with aprogrammable computer system such that the respective method isperformed.

Some embodiments according to the invention comprise a data carrierhaving electronically readable control signals, which are capable ofcooperating with a programmable computer system, such that one of themethods described herein is performed.

Generally, embodiments of the present invention can be implemented as acomputer program product with a program code, the program code beingoperative for performing one of the methods when the computer programproduct runs on a computer. The program code may for example be storedon a tangible machine readable carrier.

Other embodiments comprise the computer program for performing one ofthe methods described herein, stored on a machine readable carrier or anon-transitory storage medium.

In other words, an embodiment of the inventive method is, therefore, acomputer program having a program code for performing one of the methodsdescribed herein, when the computer program runs on a computer.

A further embodiment of the inventive methods is, therefore, a datacarrier (or a digital storage medium, or a computer-readable medium)comprising, recorded thereon, the computer program for performing one ofthe methods described herein.

A further embodiment of the inventive method is, therefore, a datastream or a sequence of signals representing the computer program forperforming one of the methods described herein. The data stream or thesequence of signals may for example be configured to be transferred viaa data communication connection, for example via the Internet.

A further embodiment comprises a processing means, for example acomputer, or a programmable logic device, configured to or adapted toperform one of the methods described herein.

A further embodiment comprises a computer having installed thereon thecomputer program for performing one of the methods described herein.

In some embodiments, a programmable logic device (for example a fieldprogrammable gate array) may be used to perform some or all of thefunctionalities of the methods described herein. In some embodiments, afield programmable gate array may cooperate with a microprocessor inorder to perform one of the methods described herein. Generally, themethods are advantageously performed by any hardware apparatus.

While this invention has been described in terms of several embodiments,there are alterations, permutations, and equivalents which fall withinthe scope of this invention. It should also be noted that there are manyalternative ways of implementing the methods and compositions of thepresent invention. It is therefore intended that the following appendedclaims be interpreted as including all such alterations, permutationsand equivalents as fall within the true spirit and scope of the presentinvention.

1. An apparatus for reproducing an audio signal based on first datarepresenting a coded version of a first portion of the audio signal in afirst frequency band and second data representing side information on asecond portion of the audio signal in a second frequency band, thesecond frequency band comprising frequencies higher than the firstfrequency band, said device comprising: a first reproducer configured toreproduce the first portion of the audio signal based on the first data;a provider configured to provide a patch signal in the second frequencyband, wherein the patch signal is at least partially uncorrelated withrespect to the first portion of the audio signal or is at leastpartially a decorrelated version of the first portion of the audiosignal, which has been shifted to the second frequency band; a secondreproducer representing a post-processor and configured to reproduce thesecond portion of the audio signal in the second frequency band based onthe second data and the patch signal, wherein a spectral envelope of thesecond portion of the audio signal, a noise floor in the second portionof the audio signal, a tonality measure for each partial band in thesecond portion of the audio signal, and an explicit coding of prominentsinusoidal portions in the second portion of the audio signal representside information represented by the second data; and a combiner tocombine the reproduced first portion of the audio signal and the patchsignal before the second portion of the audio signal is reproduced bythe second reproducer or to combine the reproduced first portion of theaudio signal and the reproduced second portion of the audio signal. 2.The apparatus of claim 1, wherein the second reproducer is configured toreproduce the audio signal in the second frequency band based on thesecond data and the patch signal if the first portion of the audiosignal does not comprise an indicator for a strong correlation betweenthe first portion of the audio signal and the second portion of theaudio signal and wherein the second reproducer is configured toreproduce the audio signal in the second frequency band based on thesecond data and a version of the first portion of the audio signal,which has been shifted to the second frequency band and which has notbeen decorrelated, if the first portion of the audio signal comprises anindicator for a strong correlation between the first portion of theaudio signal and the second portion of the audio signal.
 3. Theapparatus of claim 1, wherein the provider is configured to provide asynthetic patch signal which is uncorrelated with respect to the firstportion of the audio signal.
 4. The apparatus of claim 3, wherein thesynthetic patch signal is a noise signal.
 5. The apparatus of claim 1,wherein the provider comprises a shifting unit and a decorrelator, whichare configured to generate the patch signal as a decorrelated version ofthe first portion of the audio signal shifted to the second frequencyband.
 6. The apparatus of claim 5, wherein the decorrelator isconfigured to preserve at least one of a spectral envelope of the firstportion of the audio signal and a temporal envelope of the first portionof the audio signal.
 7. The apparatus of claim 5, wherein thedecorrelator comprises one of : an all-pass filter configured to causegroup-delay variations in the first portion of the audio signal; a phaserandomizer configured to cause phase randomization of spectralcoefficients of the first portion of the audio signal; and an applicatorconfigured to apply a frequency-dependent time delay to sub-portions thefirst portion of the audio signal.
 8. The apparatus of claim 5, whereinthe decorrelator comprises a signal adaptive decorrelator configured tovary the degree of decorrelation in order to apply a higherdecorrelation if the first portion of the audio signal does not comprisean indicator for a strong correlation between the first portion of theaudio signal and the second portion of the audio signal and to apply alower decorrelation or not to apply a decorrelation if the first portionof the audio signal comprises an indicator for a strong correlationbetween the first portion of the audio signal and the second portion ofthe audio signal.
 9. The apparatus of claim 2, comprising a detectorconfigured to detect whether the first signal portion of the audiosignal comprises the indicator for a strong correlation between thefirst portion of the audio signal and the second portion of the audiosignal.
 10. The apparatus of claim 1, wherein the provider is configuredto provide a second patch signal in a third frequency band, wherein thesecond patch signal is uncorrelated with respect to the first portion ofthe audio signal or is a decorrelated version of the first portion ofthe audio signal, which has been shifted to the third frequency band,wherein the second patch signal is uncorrelated or decorrelated withrespect to the first patch signal, wherein the apparatus comprises athird reproducer, wherein the third reproducer is configured toreproduce a third portion of the audio signal based on the second patchsignal and third data representing side information on the third portionof the audio signal in the third frequency band, the third frequencyband comprising frequencies higher than the second frequency band.
 11. Amethod for reproducing an audio signal based on first data representinga coded version of a first portion of the audio signal in a firstfrequency band and second data representing side information on a secondportion of the audio signal in a second frequency band, the secondfrequency band comprising frequencies higher than the first frequencyband, said method comprising: reproducing the audio signal in the firstfrequency band based on the first data; providing a patch signal in thesecond frequency band, wherein the patch signal is at least partiallyuncorrelated with respect to the first portion of the audio signal or isat least partially a decorrelated version of the first portion of theaudio signal, which has been shifted to the second frequency band;reproducing the second portion of the audio signal in the secondfrequency band based on the second data and the patch signal by means ofa post-processor, wherein a spectral envelope of the second portion ofthe audio signal, a noise floor in the second portion of the audiosignal, a tonality measure for each partial band in the second portionof the audio signal, and an explicit coding of prominent sinusoidalportions in the second portion of the audio signal represent sideinformation represented by the second data; and combining the reproducedfirst portion of the audio signal and the patch signal before the secondportion of the audio signal is reproduced or combining the reproducedfirst portion of the audio signal and the reproduced second portion ofthe audio signal.
 12. An apparatus for generating a coded audio signal,the coded audio signal comprising first data representing a codedversion of a first portion of the audio signal in a first frequency bandand second data representing side information on a second portion of theaudio signal in a second frequency band, the second frequency bandcomprising frequencies higher than the first frequency band, comprising:a decorrelation information adder configured to add to the coded audiosignal in addition to the first data and the second data information ona degree of decorrelation to be used between the first portion of theaudio signal and a patch signal based on which the second portion of theaudio signal is reproduced by means of a post-processor when reproducingthe audio signal from the coded audio signal, wherein a spectralenvelope of the second portion of the audio signal, a noise floor in thesecond portion of the audio signal, a tonality measure for each partialband in the second portion of the audio signal, and an explicit codingof prominent sinusoidal portions in the second portion of the audiosignal represent side information represented by the second data.
 13. Amethod for generating a coded audio signal, the coded audio signalcomprising first data representing a coded version of a first portion ofthe audio signal in a first frequency band and second data representingside information on a second portion of the audio signal in a secondfrequency band, the second frequency band comprising frequencies higherthan the first frequency band, comprising: adding to the coded audiosignal in addition to the first data and the second data information ona degree of decorrelation to be used between the first portion of theaudio signal and a patch signal based on which the second portion of theaudio signal is reproduced by means of a post-processor when reproducingthe audio signal from the coded audio signal, wherein a spectralenvelope of the second portion of the audio signal, a noise floor in thesecond portion of the audio signal, a tonality measure for each partialband in the second portion of the audio signal, and an explicit codingof prominent sinusoidal portions in the second portion of the audiosignal represent side information represented by the second data.
 14. Acomputer program comprising program code for performing a methodaccording to claim 11 when the computer program runs on a computer. 15.A computer program comprising program code for performing a methodaccording to claim 13 when the computer program runs on a computer.